home *** CD-ROM | disk | FTP | other *** search
Text File | 1995-03-15 | 9.9 KB | 218 lines | [TEXT/MMCC] |
- The Collection-Extensions library
- =================================
-
- Copyright (c) 1994 Carnegie Mellon University
- All rights reserved.
-
- The various modules in this library contain a few new types and operations
- which are compatible with the collection types specified in the Dylan
- Interim Reference Manual, but which are not part of that specification.
-
- Users of Mindy 1.1 should be aware that the string-searching functions of
- the string-search module have been moved to the String-extensions library.
- The collection-extensions module "string-search" has for that reason been
- renamed "vector-search". The semantics of the moved functions have been
- changed, so string-search users are advised to read the String-extensions
- documentation.
-
- It is to be expected that more collection types will appear in time, and
- they will likely be added to this library. This may also result in
- reorganizations which could force incompatible changes to the existing
- modules. We hope to minimize such imcompatibilities and, when forced to
- them, will include sufficient information to facilitate conversion of
- existing code.
-
- The particular modules provided at present are:
-
- heap
- Provides <heap>, a popular data structure for priority queues. The
- semantics are basically those of a sorted sequence, with particularly
- efficient implementations of add!, first, and pop (i.e.
- "remove-first").
- self-organizing-list
- Provides "self-organizing lists". These explicit key collections
- provide roughly the semantics of hash tables, but use a probabilistic
- implementation which provides O(n) worst case performance but can
- provide very fast constant time access in the best case.
- subseq
- Provides "subsequences", which represent an aliased reference to some
- part of an existing sequence. These are analogous to slices (in Ada or
- Perl) or displaced arrays (in Common Lisp). Subsequences are
- themselves subclasses of <sequence>, and can therefore be passed any
- <collection> or <sequence> operation.
- vector-search
- Provides a small assortment of specialized operations for searching and
- modifying <vector>s. These operations are analogous to existing
- collection operations but provide keywords and efficiency improvements
- which are meaningful only within the more limited domain.
-
- Heap
- ----
- A heap is an implementation of the abstract data type "sorted list". A heap
- is a sorted sequence of items. Most likely the user will end up writing
- something like
-
- define class <heap-item> (<object>);
- slot priority;
- slot data;
- end class <heap-item>;
-
- with appropriate methods defined for < and =. The user could, however, have
- simply a sorted list of integers, or have some item where the priority is
- an integral part of the item itself.
-
- make on heaps supports the less-than: keyword, which supply the heap's
- comparison and defaults to <.
-
- Heaps support all the usual sequence operations. The most useful ones:
-
- push(heap, item) => updated-heap
- pop(heap) => smallest-element
- first(heap) => smallest-element
- second(heap) => second-smallest-element
- add!(heap, item) => updated-heap
- sort, sort! => sorted-sequence
-
- These are all "efficient" operations (defined below). As with <deque>,
- push is another name for add!, and does exactly the same thing except that
- push doesn't accept any keywords. sort and sort! return a sequence that's
- not a heap. Not necessarily efficient but useful anyhow:
-
- add-new!(heap, item, #key test:, efficient:) => updated-heap
- remove!(heap, item, #key test:, efficient:) => updated-heap
- member?(heap, item, #key test:, efficient:) => <boolean>
-
- The efficient: keyword defaults to #f. If #t, it uses the
- random-iteration-protocol (which is considerably more efficient, but isn't
- really standard behavior, so it had to be optional). Conceivably most
- sequence methods could support such a keyword, but they don't yet.
-
- The user can use element-setter or the iteration protocol to change an item
- in the heap, but changing the priority of an item is an error and Bad
- Things(tm) will happen. No error will be signaled. Both of these
- operations are very inefficient.
-
- Heaps are NOT <stretchy-collection>s, although add! and pop can magically
- change the size of the heap.
-
- Efficiency: Approximate running times of different operations are given
- below: (N is the size of the heap)
-
- first, first-setter O(1)
- second (but not second-setter) O(1)
- size O(1)
- add! O(lg N)
- push O(lg N)
- pop(heap) O(lg N)
- sort, sort! O(N * lg N)
- forward-iteration-protocol
- setup: O(N)
- next-state: O(lg N)
- current-element: O(1)
- current-element-setter: O(N)
- backwards-iteration-protocol
- setup: O(N * lg N)
- next-state: O(1)
- current-element: O(1)
- current-element-setter: O(N)
- random-iteration-protocol
- setup: O(1)
- next-state: O(1)
- current-element: O(1)
- current-element-setter: O(1)
- element(heap, M) O(M*lg N + N)
- element-setter(value, heap, M) O(N + M*lg N + M)
-
- element, element-setter on arbitrary keys use the
- forward-iteration-protocol (via the inherited methods), and have
- accordingly bad performance.
-
- Self-Organizing-List
- --------------------
- The "Self Organizing List" is a "poor man's hash table". More precisely,
- <self-organizing-list> is a subclass of <dictionary> for which addition and
- retrieval are both linear in the worst case, but which use a probabilistic
- strategy which yields nearly constant time in the best case. (A
- <dictionary> is a subclass of <mutable-explicit-key-collection> and
- <stretchy-collection>, and is described in mindy.doc)
-
- Because they have a very low overhead, self-organizing lists may provide
- better peformance than hash tables in cases where references have a high
- degree of temporal locality. They may also be useful in situations where
- it is difficult to create a proper hash function.
-
- Define new self-organizing-lists with
-
- make(<self-organizing-list>, test: test)
-
- Test is expected to be an equality function. In particular, it is expected
- to satisfy the identity and transitivity requirements described in chapter
- 5. If not specified, test defaults to \==.
-
- Subseq
- ------
- <Subsequence> is a new subclass of <sequence>. A subsequence represents an
- aliased reference to some part of an existing sequence. Although they may
- be created by make (with required keywords source:, start: and end:) on one
- of the instantiable subclasses, they are more often created by calls of the
- form
-
- subsequence(sequence, start: 0, end: 3)
-
- where start: and end: are optional keywords which default to the beginning
- and end, respectively, of the source sequence. No other new operations are
- defined for subsequences, since all necessary operations are inherited from
- <sequence>.
-
- Because subsequences are aliased references into other sequences, several
- properties must be remembered:
-
- 1. The contents of a subsequence are undefined after any destructive
- operation upon the source sequence.
- 2. Destructive operations upon subsequences may be reflected in the
- source. The results of reverse! and sort! should be expected to affect
- the source sequence for vector subsequences.
-
- If the source sequences are instances of <vector> or <string>, then the
- implementation will use subclasses of <subsequence> which are also
- subclasses of <vector> or <string>.
-
- Efficiency notes:
-
- 1. The implementation tries to insure that subsequences of subsequences
- can be accessed as efficiently as the original subsequence. (For
- example, the result of
-
- subsequence(subsequence(source, start: 1), start: 2)
-
- would produce a subsequence identical to the one produced by
-
- subsequence(source, start: 3)
-
- 2. Vector subsequences, like all other vectors, implement constant time
- element access.
- 3. Non-vector subsequences may take non-constant time to create, but will
- provide constant-time access to the first element. This should produce
- the best performance provided some element of the subsequence is
- accessed at least once.
-
- Vector-Search
- -------------
- The "vector-search" module provides basic search and replace capabilities
- upon restricted subsets of <sequence> -- primarily <vector>. Exploiting
- the known properties of these types yields substantially better performance
- than can be achieved for sequences in general.
-
- The following functions are supplied:
-
- find-first-key vector predicate? #key start end failure => key
- Find the index of first element (after start but before end) of a
- vector which satisfies the given predicate. If no matching element is
- found, return failure. The defaults for start, end and failure are,
- respectively, 0, size(vector), and #f. This function is like
- find-key, but accepts start: and end: rather than skip:.)
-
- find-last-key vector predicate? #key start end failure => key
- This is like find-first-key, but goes backward from end.
-
-